Skip to content

PLT-460: Distribution samplers (uniform, zipfian θ)#46

Merged
bdchatham merged 7 commits into
mainfrom
brandon2/plt-460-distribution-samplers
Jun 15, 2026
Merged

PLT-460: Distribution samplers (uniform, zipfian θ)#46
bdchatham merged 7 commits into
mainfrom
brandon2/plt-460-distribution-samplers

Conversation

@bdchatham

@bdchatham bdchatham commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Implements PLT-460 — the real SampleIndex samplers behind the PLT-455 Distribution type, drawing from PLT-456 seeded sub-streams.

What

  • Uniform: O(1) draw into [0,n).
  • Zipfian(θ): YCSB precomputed-zeta — zeta computed once (O(N)) and cached, O(1) per draw. θ→0 collapses to uniform, θ→1 heavy hotspot. Smallest-term-first summation for numerical stability at large N.
  • Seeded via Distribution.SetStream (mirrors GasPicker.SetStream); new dist:%d:key / dist:%d:size stream ids added to the FROZEN contract (append-only, non-perturbing). Unbound → global RNG (preserves the unseeded path).

Review (systems + idiom)

  • Zipfian PMF independently validated against true Zipf; top-k skew monotone in θ (θ=0→1%, 0.5→9.5%, 0.9→42%). Determinism + per-stream-multiset contract upheld; -race clean.
  • Fixes applied: guarded a masked eta NaN at n≤2; documented the n-must-be-stable contract; renamed a builtin-shadowing helper + a misleading field; marked the mutex-bearing struct copy-unsafe.

Decision brief: designs/sei-load-workload-modeler/PLT-460-distribution-samplers.md.

🤖 Generated with Claude Code


📐 Design: decision brief — PLT-460 · parent design

bdchatham and others added 2 commits June 12, 2026 07:53
Implement the real index samplers behind the frozen Distribution wire
format from PLT-455, bound to the PLT-456 seeded sub-streams.

Uniform: Stream.Uint64N(n) seeded, rand.Uint64N(n) unbound.

Zipfian: YCSB precomputed-zeta generator. zeta(n, theta) is computed
once per keyspace size n in O(n) and cached (summed smallest-term-first
for numerical stability at n=1e6); each draw is O(1). n arrives at
sample time, so the cache fills lazily and recomputes only if n changes.

Stream binding mirrors GasPicker.SetStream: Distribution.SetStream
type-switches the delegate; a nil stream falls back to the global RNG.
bindDistributionStreams wires the per-scenario streams in the generator.

FROZEN contract: added stream ids "dist:%d:key" / "dist:%d:size"
(append-only; documented in rng.go input #2 and streams.go).

Tests: chi-square uniformity; top-1% mass rising monotonically with
theta (1% -> 9.5% -> 42% -> 53%); per-stream determinism; seeded !=
unseeded binding guard; n=0 error; NaN/range sweep across theta; n=1e6
init+1000 draws bounded (~50ms). go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Guard eta against the n<=2 NaN (denom==0) by pinning it to 0; it is
provably never read at n<=2 but a NaN in cached state is a refactor
hazard. Document the n-stability contract on SampleIndex so PLT-465 can
not silently trigger per-draw O(n) zeta recomputes. Rename thetaPow2 ->
halfPowTheta (it holds 0.5^theta), inline the RNG fallback to mirror
UniformDistribution/gas.go, and mark the struct not copy-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes core workload randomness and replay contract (new frozen stream ids); zipfian uses float math and mutex caching where incorrect n stability could hurt throughput, but scope is load-test config rather than auth or payments.

Overview
Replaces stub SampleIndex implementations with real uniform and YCSB zipfian(θ) keyspace index samplers, wired for deterministic replay via Distribution.SetStream and per-scenario RNG sub-streams.

Uniform draws in [0, n) with equal probability (seeded Uint64N or global RNG). Zipfian precomputes zeta(n, θ) once per stable n, caches O(1) draw constants behind a mutex, and skews mass toward low indices as θ rises; n == 0 now errors instead of returning 0 silently. JSON unmarshaling no longer embeds stream state—streams bind at generator startup like gas pickers.

The generator adds bindDistributionStreams for KeyDistribution / SizeDistribution using new append-only frozen stream ids dist:%d:key and dist:%d:size. Package config/doc.go documents wire format, math, and reproducibility contracts. Tests cover determinism, statistical sanity, zipfian performance after warmup, cache invalidation on n change, and numerical edge cases.

Reviewed by Cursor Bugbot for commit 3a673df. Bugbot is set up for automated code reviews on this repo. Configure here.

- Drop the pointless JSON unmarshal into the field-less UniformDistribution
  (its only field, the seeded stream, is bound via SetStream, not JSON) — SA9005.
- Remove the pointless math.IsNaN on a uint64-derived float in the zipfian test;
  the in-range assertion is the real guard against a NaN-derived draw — SA4015.

Caught by golangci-lint (CI gate); local build+test+vet did not run it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bdchatham and others added 2 commits June 12, 2026 13:32
Move the dense distribution narrative (zipfian zeta math, precompute-once
design, numerical stability, frozen wire format, seeded-stream reproducibility)
into a new package-level doc.go and lean distribution.go's comments to terse
pointers. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…parallelism

- config/doc.go: 'sub-stream' was split across lines, rendering as 'sub- stream'
  under go doc; reflow to 'substream'.
- distribution.go: restore the '(no Name)' gloss on SampleIndex for parallelism
  with SetStream and the package doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham requested review from amir-deris and masih June 12, 2026 22:48
@amir-deris

Copy link
Copy Markdown
  1. zeta recompute on n change — documented, but worth a runtime warning (or at least a test assertion) if n changes across draws for the same sampler, since the caller contract says it shouldn't.
    Right now a silently-changing n just silently tanks performance with no signal.
  2. SampleIndex on UniformDistribution takes a pointer receiver — changed from value receiver in the stub. That's correct (needs to read u.stream), but UniformDistribution doesn't have the copy-unsafe
    warning that ZipfianDistribution does. It's not mutex-bearing so it's technically safe to copy (the stream pointer would alias, not be lost), but adding a note for consistency wouldn't hurt.
  3. TestSampleIndexSeededDiffersFromUnseeded — this test uses the global rand RNG for the "unseeded" path. On a fixed seed run the global RNG could theoretically match — though the comment
    acknowledges the probability is ~0. Acceptable in practice.
  4. TestZipfianInitCostBounded — the 2-second budget for n=1e6 is very generous. On slow CI machines this is fine, but it means the test won't catch a real performance regression until it's ~100×
    slower than intended. A tighter bound (say 500ms) with a comment justifying it would be more useful as a sentinel.
  5. Stream IDs added to the frozen contract — correctly documented in both rng.go and streams.go. The append-only annotation is clear. No concerns.

Lock the n-stability contract with a test that pins recompute-on-n-change
(config is a pure library; std log lives only in service layers, so a hot-path
sampler warning is the wrong fit). Note UniformDistribution as copy-safe for
parity with the ZipfianDistribution note. Tighten the init-cost sentinel from
2s to 500ms (~16x over the measured ~30ms) so a real per-draw regression trips.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Contributor Author

Thanks @amir-deris — addressed in 9226a2b:

  1. zeta recompute on n change — added a test assertion (TestZipfianRecomputesOnNChange) that pins the recompute-on-n-change behavior, rather than a runtime warning. Rationale: config/ is a pure wire-types/library package with zero logging (all log.Printf lives in the service layers — sender/funder/generator/main/stats); adding logging to a sampler hot-path would be the smell you flagged and the only logging in the package. The test locks the contract so a refactor that drops the n-change check is caught. If you'd rather have a runtime signal (e.g. a metrics counter on recompute, since the caller is service-layer), say the word and I'll add it there.
  2. UniformDistribution copy note — added for parity: // copy-safe: holds no mutex; the *rng.Stream pointer aliases on copy.
  3. TestZipfianInitCostBounded — tightened 2s → 500ms (measured ~30ms init+1000 draws; ~16× headroom, non-flaky under -race, trips well before an O(n)-per-draw regression), with a justifying comment.
  4. Points 3 & 5 (unseeded-RNG-match probability ~0; frozen stream-ids) — left as-is per your "acceptable / no concerns."

All green: make lint 0 issues, go test -race -count=5 ./config/... (15 runs) clean. Re-requesting your review since the push dismissed the approval.

@bdchatham bdchatham requested a review from amir-deris June 15, 2026 16:48
The 500ms bound timed the one-time O(n) zeta precompute together with
1000 draws; that init ran ~1.08s in CI under -race on a loaded runner,
making any tight absolute bound over it flaky.

Move the init out of the timed window via a warmup draw, then time only
the 1000 post-warmup O(1) draws against a 200ms bound. That bound sits
far below an O(n)-per-draw regression (~minutes) yet well above CI noise
for O(1) work, so it stays a meaningful, non-flaky sentinel.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham merged commit 3d4b234 into main Jun 15, 2026
4 checks passed
@bdchatham bdchatham deleted the brandon2/plt-460-distribution-samplers branch June 15, 2026 17:13
bdchatham added a commit that referenced this pull request Jun 15, 2026
…ngci.yml (#49)

Closes the gap that let staticcheck violations slip to CI on #46/#48:
our local loop ran `build`+`test`+`vet` (no `golangci-lint`), and CI's
linter was unpinned (`latest`).

## What (parity + determinism, not a CI rewrite)
- **`make verify`** = `lint` + `test` + `check-bindings` — runs exactly
what CI gates, in one command. Run it before pushing.
- **Pinned golangci-lint v2.12.2** across all three surfaces: the
`build-and-test.yml` action (`latest`→`v2.12.2`), a Makefile
`GOLANGCI_VERSION` + `make install-lint`, and a new **`.golangci.yml`**
that freezes the enabled linter set (v2.12.2 defaults: errcheck, govet,
ineffassign, staticcheck, unused). `make lint` warns if your PATH binary
drifts.
- README "Before you push" section documenting the flow.

## Why
`go vet` doesn't run staticcheck; CI's golangci-lint does — and
`version: latest` drifts (local 2.12.1 vs CI 2.12.2). Pinning + a single
`verify` target makes local results match CI deterministically.

## Review
Idiom: clean (mirrors the existing `SOLC_VERSION`/`GETH_VERSION` pinning
convention). Security: net supply-chain improvement (pinning narrows
drift; `go install @v2.12.2` is checksum-DB-backed) — no
token/permission/trigger changes.

Tracked: PLT-474. (Deferred, noted in review: SHA-pinning the `@v7`
action major — pre-existing repo convention, repo-wide pass.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants